We will mainly focus today on methods for analyzing and forecasting regular time-series data with seasonality patterns
By the end of this workshop, you probably won’t become an expert in time series analysis and forecasting, but you will be able to:
All today’s slides, code, and rmarkdown files are available on GitHub
Downloading the workshop material from the terminal:
git clone https://github.com/RamiKrispin/Time-Series-Workshop.git
Or lunch it from a docker container:
Time series analysis is commonly used in many fields of science, such as economics, finance, physics, engineering, and astronomy. The usage of time series analysis to understand past events and to predict future ones did not start with the introduction of the stochastic process during the past century. Ancient civilizations such as the Greeks, Romans, or Mayans, researched and learned how to utilize cycled events such as weather and astronomy to predict future events.
Time series analysis - is the art of extracting meaningful insights from time-series data to learn about past events and to predict future events.
This process includes the following steps:
Generally, in R this process will look like this:
Of course, there are more great packages that could be part of this process such as zoo, xts, bsts, forecastHybird, TSstudio, etc.
Time series data - is a sequence of values, each associate to a unique point in time that can divide to the following two groups:
Note: typically, the term time series data referred to regular time-series data. Therefore, if not stated otherwise, throughout the workshop the term time series (or series) refer to regular time-series data
With time series analysis, you can answer questions such as:
There are multiple classes in R for time-series data, the most common types are:
ts class for regular time-series data, and mts class for multiple time seires objects , the most common class for time series dataxts and zoo classes for both regular and irregular time series data, mainly popular in the financial fieldtsibble class, a tidy format for time series data, support both regular and irregular time-series dataA typical time series object should have the following attributes:
Where the frequency of the series represents the units of the cycle. For example, for monthly series, the frequency units are the month of the year, and the cycle units are the years. Similarly, for daily series, the frequency units could be the day of the year, and the cycle units are also the years.
The stats package provides a set of functions for handling and extracting information from a ts object. The frequency and cycle functions, as their names implay return the frequency and the cycle, respectivly, of the object. Let’s load the USgas series from the TSstudio package and apply those functions:
library(TSstudio)
data(USgas)
class(USgas)
## [1] "ts"
is.ts(USgas)
## [1] TRUE
frequency(USgas)
## [1] 12
cycle(USgas)
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 2000 1 2 3 4 5 6 7 8 9 10 11 12
## 2001 1 2 3 4 5 6 7 8 9 10 11 12
## 2002 1 2 3 4 5 6 7 8 9 10 11 12
## 2003 1 2 3 4 5 6 7 8 9 10 11 12
## 2004 1 2 3 4 5 6 7 8 9 10 11 12
## 2005 1 2 3 4 5 6 7 8 9 10 11 12
## 2006 1 2 3 4 5 6 7 8 9 10 11 12
## 2007 1 2 3 4 5 6 7 8 9 10 11 12
## 2008 1 2 3 4 5 6 7 8 9 10 11 12
## 2009 1 2 3 4 5 6 7 8 9 10 11 12
## 2010 1 2 3 4 5 6 7 8 9 10 11 12
## 2011 1 2 3 4 5 6 7 8 9 10 11 12
## 2012 1 2 3 4 5 6 7 8 9 10 11 12
## 2013 1 2 3 4 5 6 7 8 9 10 11 12
## 2014 1 2 3 4 5 6 7 8 9 10 11 12
## 2015 1 2 3 4 5 6 7 8 9 10 11 12
## 2016 1 2 3 4 5 6 7 8 9 10 11 12
## 2017 1 2 3 4 5 6 7 8 9 10 11 12
## 2018 1 2 3 4 5 6 7 8 9 10 11 12
## 2019 1 2 3 4 5 6 7
The time function returns the series index or timestamp:
head(time(USgas))
## [1] 2000.0000 2000.0833 2000.1667 2000.2500 2000.3333 2000.4167
The deltat function returns the length of series’ time interval (which is equivalent to 1/frequency):
deltat(USgas)
## [1] 0.083333333
Similarly, the start and end functions return the starting and ending time of the series, respectively:
start(USgas)
## [1] 2000 1
end(USgas)
## [1] 2019 7
Where the left number represents the cycle units, and the right side represents the frequency units of the series. The tsp function returns both the start and end of the series and its frequency:
tsp(USgas)
## [1] 2000.0 2019.5 12.0
Last but not least, the ts_info function from the TSstudio package returns a concise summary of the series:
ts_info(USgas)
## The USgas series is a ts object with 1 variable and 235 observations
## Frequency: 12
## Start time: 2000 1
## End time: 2019 7
The ts function allows to create a ts object from a single vector and a mts object from a multiple vectors (or matrix). By defining the start (or end) and frequency of the series, the function generate the object index. In the following example we will load the US_indicators dataset from the TSstudio package and convert it to a ts object. The US_indicators is a data.frame with the monthly vehicle sales and unemployment rate in the US since 1976:
data(US_indicators)
head(US_indicators)
## Date Vehicle Sales Unemployment Rate
## 1 1976-01-31 885.2 8.8
## 2 1976-02-29 994.7 8.7
## 3 1976-03-31 1243.6 8.1
## 4 1976-04-30 1191.2 7.4
## 5 1976-05-31 1203.2 6.8
## 6 1976-06-30 1254.7 8.0
mts_obj <- ts(data = US_indicators[, c("Vehicle Sales", "Unemployment Rate")],
start = c(1976, 1),
frequency = 12)
ts_info(mts_obj)
## The mts_obj series is a mts object with 2 variables and 524 observations
## Frequency: 12
## Start time: 1976 1
## End time: 2019 8
| Series Type | Cycle Units | Frequency Units | Frequency |
|---|---|---|---|
| Quarterly | Years | Quarter of the year | 4 |
| Monthly | Years | Month of the year | 12 |
| Weekly | Years | Week of the year | 52 |
| Daily | Years | Day of the year | 365 |
What if you have more granular time series data such as half-hour, 15 or five minutes intervals?
Below me when needed to work with daily time series using ts object:
The ts object was designed for work with monthly, quarterly, or yearly series that have only two-time components (e.g., year and month). Yet, more granular series (high frequency) may have more than two-time components. A common example is a daily series that has the following time attributes:
When going to the hourly or minute levels, this is even adding more components such as the hour, minute, etc.
The zoo, xts classes and now the tsibble class provide solution for this issue.
“The tsibble package provides a data infrastructure for tidy temporal data with wrangling tools…”
In other words, the tsibble object allows you to work with a data frame alike (i.e., tbl object) with a time awareness attribute. The key characteristics of this class:
tbl object - can apply any of the normal tools to reformat, clean or modify tbl object such as dplyr functionsThe reaction of me and my colegues when the tsibble package was released:
library(UKgrid)
data(UKgrid)
class(UKgrid)
## [1] "data.frame"
head(UKgrid)
## TIMESTAMP ND I014_ND TSD I014_TSD ENGLAND_WALES_DEMAND
## 1 2011-01-01 00:00:00 34606 34677 35648 35685 31058
## 2 2011-01-01 00:30:00 35092 35142 36089 36142 31460
## 3 2011-01-01 01:00:00 34725 34761 36256 36234 31109
## 4 2011-01-01 01:30:00 33649 33698 35628 35675 30174
## 5 2011-01-01 02:00:00 32644 32698 34752 34805 29253
## 6 2011-01-01 02:30:00 32092 32112 34134 34102 28688
## EMBEDDED_WIND_GENERATION EMBEDDED_WIND_CAPACITY
## 1 484 1730
## 2 520 1730
## 3 520 1730
## 4 512 1730
## 5 512 1730
## 6 464 1730
## EMBEDDED_SOLAR_GENERATION EMBEDDED_SOLAR_CAPACITY NON_BM_STOR
## 1 0 79 0
## 2 0 79 0
## 3 0 79 0
## 4 0 79 0
## 5 0 79 0
## 6 0 79 0
## PUMP_STORAGE_PUMPING I014_PUMP_STORAGE_PUMPING FRENCH_FLOW BRITNED_FLOW
## 1 60 67 1939 0
## 2 16 20 1939 0
## 3 549 558 1989 0
## 4 998 997 1991 0
## 5 1126 1127 1992 0
## 6 1061 1066 1992 0
## MOYLE_FLOW EAST_WEST_FLOW I014_FRENCH_FLOW I014_BRITNED_FLOW
## 1 -382 0 1922 0
## 2 -381 0 1922 0
## 3 -382 0 1974 0
## 4 -381 0 1975 0
## 5 -382 0 1975 0
## 6 -381 0 1975 0
## I014_MOYLE_FLOW I014_EAST_WEST_FLOW
## 1 -382 0
## 2 -381 0
## 3 -382 0
## 4 -381 0
## 5 -382 0
## 6 -381 0
library(dplyr)
uk_grid <- UKgrid %>%
dplyr::select(time = TIMESTAMP,
net_demand = ND,
wind_gen = EMBEDDED_WIND_GENERATION,
solar_gen = EMBEDDED_SOLAR_GENERATION)
Like most common fields of statistics and machine learning, the goal of the descriptive analysis is to reveal meaningful insights about the series and the key components of its structure.
ts_decompose(USgas)